The Relationship Between Movement (Bus Ridership) and Motivation (POI) in Singapore

TEAM MEMBER: Chen MO, Jia Zheng, Zhang Ruijia, Liang Weiyi

Introduction

In the city, people move around with specific intentions. They may commute between two locations, or go somewhere else to hang out on the weekends. Hence, the spots of their activities and the flow of traffic are closely related1. There has long been an interest among transportation planners to study the relationships between people’s movements and motivations2, in order to produce models for urban transportation and human activities. Therefore, with origin-destination (OD) and point-of-interest (POI) as respective indicators, this report tries to launch the research in the context of Singapore, exploring and explaining the relationships of movements and motivations. It will start with a brief literature review on the existing studies involving OD and POI, followed by an introduction of the modified methodology. It will then present the results and visualizations of the data, and finally draw a conclusion on the inspirations from the analyses.

Literature Review

i. Application of OD data

Flow mapping is probably the most explored application of Origin-Destination (OD) data. With straight or curved lines connecting the endpoints, it presents the directions and volumes of movements. To solve the clutter problem, improvements of such maps include flowstrates3, OD maps4, and waypoints-constrained OD views5, 6. However, most of them only examine issues on directions in the spatial dimension, lacking attention to the volumes or spatio-temporal relationships.

ii. Application of POI data

Points of interest (POI) data have been applied to commercial networks, looking for locations of facilities including postal offices and lottery terminals 7. Research is done to include multi-dimensional messages including longitudes and latitudes, names of places, categories, and additional texts. However, most of the studies focus on a collection and representation of information, lacking in meaningful conclusions from POI.

iii. Relationships between movements and motivations

The relationships between movements and motivations has become a focused topic among urban researchers and planners8. The past studies used traditional methods to acquire statistics from a small group of samples, and accordingly estimated the flow of populations9, consuming intensive labor and cost. Nowadays the evolution of technologies has provided possibilities for solutions to this inefficiency. A number of researchers have taken advantage of mobile phone data to estimate the mobility patterns10, 11, 12, 13. With activity-based approaches, they interpret mobile phone call detail record (CDR) data into the individual mobility statistics. Another type of reliable data is the taxi-GPS traces14, which are used to predict the demands of passengers. These methods, while providing the big picture view, lack the comparative analyses of specific types of activities and locations15. To satisfy the demands of researchers to explore the relationships between motivations and mobility, Wei Zeng et al. propose a novel compact visual representation, namely POI-mobility signature, to perform an analysis based on OD data representing mobility as well as POI data representing motivations.

Methodology

To scale down the research, we focus on the relationships between volumes of bus stops and most relevant POIs, categorized as amenity, business, public service, education, healthcare. First, we observed and collated the OD and POI data, followed by overview analyses. Then we refined the studies from both spatial and temporal dimensions. Steps and approaches are as follows:

  1. Data preprocessing;
  2. Spatial visualization of OD data and POI data;
  3. Density-based spatial clustering of applications with noise to preliminarily explore the relationships between bus stops and POIs;
  4. Machine learning to evaluate the correlation between POI distribution and passenger volumes;
  5. Temporal visualization of OD data;
  6. Radar graphs pie charts to visualize the trends of volumes in the spatial-temporal dimension.

Data Pre-processing

# The pre-processing of file transformation(json to csv) and compiliment is completed before.

# csv
bus <- read_csv("groupdata/transport_node_bus_202001 (1).csv") # OT data
POI <- read_csv("groupdata/POI/POI.csv") # poi data
loca<- read_csv("groupdata/location.csv") # location of bus stop

# shp
PA <- read_sf("groupdata/singapore-residents-by-planning-area-and-type-of-dwelling-june-2016-shp")
PA <- st_transform(PA, crs = 4326)
# classify the poi
POI %>%
  mutate(category = case_when(
    interest %in% c("Food_court","Bar","Attraction","Cinema", "Theatre") ~ "Amenity",
    interest %in% c("Commercial","Office") ~ "Business",
    interest %in% c("Post_office","Police", "Bank", "Community_centre", "Social_facility", "Retail") ~ "Public_service",
    interest %in% c("Library","College","School","University", "Kindergarten") ~ "Education",
    interest %in% c("Kindergarten","Hospital","Dentist","Childcare", "Clinic") ~ "Healthcare"
  )) -> POI
# poi to shp
poi = st_as_sf(POI, coords = c('X', 'Y'),  crs = 4326)
# provide bus stop with location
bus %>%
  group_by(PT_CODE) %>%
  mutate(TAP_IN=sum(TOTAL_TAP_IN_VOLUME),
         TAP_OUT=sum(TOTAL_TAP_OUT_VOLUME)) %>%
  select(5,8,9) -> stop
stop <- base::unique(stop)
stop <- left_join(stop, loca, by=c('PT_CODE'='BusStopCode')) %>% 
  filter(!is.na(Longitude),!is.na(Latitude))
# OD data to shp
geobus <- left_join(bus, loca, by=c('PT_CODE'='BusStopCode')) 
geobus <- tibble::rowid_to_column(geobus, "ID")
geobus %>% 
  filter(!is.na(Longitude),!is.na(Latitude)) -> geobus
geo = st_as_sf(geobus, coords = c('Longitude', 'Latitude'),  crs = 4326)
# bus stop to shp
geostp = st_as_sf(stop, coords = c('Longitude', 'Latitude'),  crs = 4326)
geostp_nogeo <- st_drop_geometry(geostp)
# put PA data into bus stop
sf::sf_use_s2(FALSE) 
joined_data <- st_join(geostp, PA, join = st_intersects)
joined_data <- st_set_crs(joined_data, 4326)
# put bus stop data into PA
joined_data %>%
  group_by(PLN_AREA_N) %>%
  count(TOTAL)%>%
  st_drop_geometry() -> jPA
jPA <- merge(PA, jPA)

Results

1. Spatial Analysis

1.1 Distribution of Bus Stops and Bus Stop Passenger Volumes

We start from an overview of conditions of bus stops among the city. The numbers of bus stops and corresponding total volumes of these bus stops in each planning area are marked on the two maps. Bedok has the most bus stops as well as the largest passenger volumes. With the fewest bus stops, areas including Central Water Catchment, Mandai, Choa Chu Kang, Tanlin, Paya Lebar carry the smallest volumes. Inconsistent with this correlation, Tuas has a considerable number of bus stops with small passenger flow volumes. It is probably related to the land use of Tuas, which almost features industry only.

# Distribution of Bus Stops and Bus Stop Passenger Volumes
tmap_mode('plot')

bus_number <- tm_shape(jPA) +
  tm_polygons(col = "n", border.col='grey95',palette='RdPu',title='Number')+
  tm_layout(title = 'Number of buses in Each Planning Area',
            inner.margins = c(.02, .02, .02, .02))+
  tm_credits("Data: LTA DatamMall")

bus_volume <- tm_shape(jPA) +
  tm_polygons(col = "TOTAL", palette="Reds", border.col='white',title='Total Volume')+
  tm_layout(title = 'Total Tap in Volume in Each Planning Area',
            inner.margins = c(.02, .02, .02, .02))+ 
  tm_credits("Data: LTA DatamMall")

tmap_arrange(bus_number,bus_volume,nrow=1,ncol=2 )

# stop amount
bus_number_2 <- jPA %>%
  ggplot(mapping = aes(y=reorder(PLN_AREA_N,n/SHAPE_Area), x=n/SHAPE_Area, color, fill=n/SHAPE_Area)) +
  geom_col(width = .7)+
  theme(panel.grid.major.x = element_blank(), panel.grid.minor.x = element_blank())+
  labs(x = "density", y = "planning areas",
       title = "Density of Bus Stops within Each PA",
       subtitle = "sorted by density (in descending order)",
       caption = 'Data:LTA DataMall',
       fill="density scale")+
      scale_fill_distiller()+
  theme(axis.text.y = element_text(size = 7),axis.text.x = element_text(size = 7)) 

bus_volume_2 <-jPA %>% 
  ggplot(mapping = aes(x=TOTAL,y=reorder(PLN_AREA_N,n/SHAPE_Area),fill=TOTAL)) +
  geom_col(width = .7)+
  theme(panel.grid.major.x = element_blank(), panel.grid.minor.x = element_blank())+
  labs(x = "volume", y = "planning areas",
       title = "Total Tap in Volume within Each PA",
       subtitlea = "sorted by total tap in volume (in descending order)",
       caption = 'Data:LTA DataMall',
       fill="volume scale")+
  scale_fill_distiller(palette = "RdPu")+
    theme(axis.text.y = element_text(size = 7),axis.text.x = element_text(size = 7)) 

grid.arrange(bus_number_2, bus_volume_2, nrow = 1,ncol=2)

1.2 POI Distribution

The spatial analysis of POI data is visualized on the following maps.

i. Overview of POIs of different categories distributed among PAs

The POIs are of the highest density around Downtown district.

# 1.2 POI Data
tm_shape(PA) +
  tm_borders(alpha = 0.2)+
  tmap_options(check.and.fix = TRUE) +
tm_shape(poi) +
  tm_dots(col="category", size=0.06)+
  tm_layout(title = 'Distribution of Each Category of POI') +
  tm_credits("Data: LTA DataMall;OSM")

ii. Numbers of Categorized POIs in PAs
#ii. Numbers of Categorized POIs in PAs

label <- as.data.frame(unique(poi$category)) 

# poi density in PA
tmap_mode('view')

a <- as.list(0)
for (i in 1:nrow(label)) {
  joined_poi %>%
    filter(category == label[i,]) %>%
    count(PLN_AREA_N) %>%
    st_drop_geometry() -> a[[i]]
  a[[i]] <- merge(PA, a[[i]])
  tm_shape(PA) +
  tm_credits("Data: LTA DataMall;OSM")+
  tm_borders() +
  tm_shape(a[[i]]) +
  tm_polygons(col = "n", palette = "BuPu", title = label[i,])+
  tmap_options(check.and.fix = TRUE) -> a[[i]]
}


joined_poi %>%
    count(PLN_AREA_N) %>%
    st_drop_geometry() -> a[[6]]
  a[[6]] <- merge(PA, a[[6]])

tm_shape(PA) +
  tm_credits("Data: LTA DataMall;OSM")+
  tm_borders() +
  tm_shape(a[[6]]) +
  tm_polygons(col = "n", palette = "BuPu", title = "All Category")+
  tmap_options(check.and.fix = TRUE) -> a[[6]]

tmap_arrange(a[[1]],a[[2]],a[[3]],a[[4]],a[[5]],a[[6]])

Most of the PAs have similar density in all five categories of POIs, except Central Water Catchment which is occupied by amenity only.Considering the number of bus stops, bus stop passenger volumes, and the distribution of various POIs within each PA, we have selected four PAs, Ang Mo Kio, Jurong West, Queenstown, and Bedok, whose data are representative for the subsequent study.Following the above overview analyses, we conduct the DBSCAN and machine learning to test the correlations of bus stops and POIs in the spatial dimension.

1.3 Density-Based Spatial Clustering of Applications with Noise (DBSCAN) of Bus Stops and OD Data

DBSCAN is applied to both bus stops and POIs to collaboratively analyze features of the two. The superior mobility of one spot is identified by the fact that another five spots are within the distance of 250 meters. Take bus stops as an example. The eligible bus stops form clusters, and every bus stop in such a cluster is able to reach at least five other stops within 250 meters. In this way, locations of bus stops are divided into separate zones. The result graph shows the zoning of bus stops in Singapore.

a) Bus Stop
# convert lat/lon into xy coordinate
xy <- geoXY(stop$Longitude,stop$Latitude,unit = 1)
xy <- as.data.frame(xy)
xy <- xy[, c("Y","X")]
colnames(xy) <- c('x','y')
xy$id <- 1:nrow(xy)
stop$id <- 1:nrow(stop)
stop <- left_join(stop, xy, by = c("id"="id"))

# DBSCAN for bus stop
set.seed(123)
test <- fpc::dbscan(stop[,10:11], eps = 250, MinPts = 5)
fviz_cluster(test, data = stop[,10:11], stand = 0, show.clust.cent = 0, 
             geom = "point", palette = "Set2", ggtheme = theme_minimal()) +
  labs(title = "Cluster of Bus Stop in Singapore",
       caption="Data: LTA DataMall") +
  theme(axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        legend.position = "None")

The same approach and result graph applied to POIs as well.

b) POI
# convert lat/lon into xy coordinate

zw <- geoXY(POI$X,POI$Y,unit = 1)
zw <- as.data.frame(zw)
zw <- zw[, c("Y","X")]
colnames(zw) <- c('x','y')

# DBSCAN for poi
zw$id <- 1:nrow(zw)
POI$id <- 1:nrow(POI)
POI_zw <- left_join(POI, zw, by = c("id"="id"))
set.seed(123)
poi_test <- fpc::dbscan(POI_zw[,6:7], eps = 250, MinPts = 5)
fviz_cluster(poi_test, POI_zw[,6:7], stand = 0, show.clust.cent = 0, 
             geom = "point", palette = "Set2", ggtheme = theme_minimal()) +
  labs(title = "Cluster of POI in Singapore",
       caption="Data: LTA DataMall") +
  theme(axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        legend.position = "None")

The DBSCAN on bus stops divides Singapore into eight zones. With seven smaller ones scattered around, the cluster in the center occupies the largest area, suggesting a fluent mobility inside. The same pattern is seen on the result graph of POIs as well, featuring a large area in the center. Therefore, through observation, we have seen a potential relationship between the distribution of bus stops and the POIs. Therefore, further analyses are done to explore this relationship.

1.4 Correlation Analysis and Machine Learning Models for the Relationship between POI and Passenger Volume

We conduct Pearson correlation analysis between passenger volumes and the number of different types of POIs near bus stops. We define POIs within 500 meters to the stop as “nearby”, and compute “For cycle” to calculate the number of nearby POIs of each category respectively. Based on the data distribution pattern shown in the histogram plot, the outlier data (volume>2000) are filtered out to eliminate irrelevant ones.

# pre-processing : buffer & poi in 5 PA
# 1. create buffer
bus_500 <- st_buffer(geostp, 500)

bus_corre <- st_drop_geometry(bus_500) %>%
  mutate(Amenity_500=0,
         Business_500=0,
         Public_service_500=0,
         Education_500=0,
         Healthcare_500=0,)

# 2. count 5 poi in buffer
for (i in 1:nrow(bus_500)) {
# Amenity
  a1=lengths(st_intersects(subset(poi,category=="Amenity"), bus_500[i,4]))
  sum(a1) -> bus_corre[i,4]
# Business
  b1=lengths(st_intersects(subset(poi,category=="Business"), bus_500[i,4]))
  sum(b1) -> bus_corre[i,5]
# Public_service
  c1=lengths(st_intersects(subset(poi,category=="Public_service"), bus_500[i,4]))
  sum(c1) -> bus_corre[i,6]
# Education
  d1=lengths(st_intersects(subset(poi,category=="Education"), bus_500[i,4]))
  sum(d1) -> bus_corre[i,7]
# Healthcare
  e1=lengths(st_intersects(subset(poi,category=="Healthcare"), bus_500[i,4]))
  sum(e1) -> bus_corre[i,8]
}
ggplot(data=bus_poi,mapping=aes(x=TAP_IN))+
  geom_histogram(fill="#b3cde3",colour='white')+
  scale_x_continuous(labels=comma,limits = c(0, 400000))+
  labs(title='Distribution of TAP_IN data (Number)',
        caption="Data:LTA DataMall; OSM")+theme_minimal()

bus_poi %>%
  filter(TAP_IN<200000)->bus_poi

Correlation Analysis

From the corrplot, we can see that all the five kinds of POI have a weak, or little positive correlation with passenger volumes. Comparably, public service, education and healthcare show stronger correlations with passenger volumes. Due to the similar correlation pattern of tap-in and tap-out volumes, the tap-in volumes are used as proxy data to research on the passenger volumes in the following steps. Considering the disturbance of other factors, the relationships among these parameters cannot be defined by the correlation study only.

bus_poi %>%
ungroup() %>%
mutate(Amenity=Amenity_500,Business=Business_500,Public_service=Public_service_500,Education=Education_500,Healthcare=Healthcare_500) %>%
select(TAP_IN,TAP_OUT,Amenity,Business,Public_service,Education,Healthcare) ->corr_bus_poi
corr_bp <- cor(corr_bus_poi)
col <- colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))
corrplot(corr_bp, method="color",type="lower", col=col(200),
         tl.srt=0,addCoef.col = "black",tl.col="black", 
         diag=FALSE)
mtext("Data:LTA DataMall; OSM", at=7, line=-27.5, cex=0.9)

Then we build machine learning models (linear regression and random forest) to explore the relationship between passenger volumes and POIs, using the features of tap-in volume and the numbers of nearby POIs in the five categories.

set.seed(123)
tap_in = bus_poi %>%
  select(TAP_IN, Amenity_500, Business_500, Public_service_500, Education_500, Healthcare_500)
vi = createDataPartition(tap_in$TAP_IN, p=0.90, list=FALSE)

train = tap_in[vi,]
test = tap_in[-vi,]

# min-max normalize
pp = preProcess(train, method = "range")
train = predict(pp, train)
test = predict(pp, test)


# baseline
baseline =mutate(test, pred =  mean(train$TAP_IN))
m_lm <- train(TAP_IN ~ ., data=train, method="lm")
predicted_lm = predict(m_lm, test)
m_rf <- train(TAP_IN ~ ., data=train, method="rf")
predicted_rf = predict(m_rf, test)
df_perf = data.frame(matrix(ncol=4, nrow=0, dimnames=list(NULL, c("name", "RMSE", "RSquared", "MAE"))))
df_perf[nrow(df_perf)+1,] = append("Baseline: TAP_IN", postResample(pred = baseline$pred, obs = baseline$TAP_IN))
df_perf[nrow(df_perf)+1,] = append("Linear Regression: TAP_IN", postResample(pred = predicted_lm, obs = test$TAP_IN))
df_perf[nrow(df_perf)+1,] = append("Random Forest: TAP_IN", postResample(pred = predicted_rf, obs = test$TAP_IN))
df_perf$MAE <- as.numeric(df_perf$MAE)
df_perf$RSquared <- as.numeric(df_perf$RSquared)
df_perf$RMSE <- as.numeric(df_perf$RMSE)

kable(df_perf,
      col.names = c("Model", "RMSE","RSquared", "MAE"),digits=4) %>%
kable_styling(c("striped", "condensed")) %>%
row_spec(1,  color = "black", background = "#fbb4ae") %>%
row_spec(2, color = "black", background = "#b3cde3") %>%
row_spec(3, bold = T, color = "black", background = "#ccebc5")
Model RMSE RSquared MAE
Baseline: TAP_IN 0.1288 NA 0.0874
Linear Regression: TAP_IN 0.1226 0.0948 0.0826
Random Forest: TAP_IN 0.1138 0.2203 0.0736
observed=test$TAP_IN
p1 <- ggplot(data=data.frame(observed, predicted_lm), mapping=aes(x=predicted_lm, y=observed)) + 
  geom_point()+ geom_smooth() + theme_minimal()+
  scale_x_continuous(labels=comma, limits = c(0.05, 0.3)) +
  scale_y_continuous(labels=comma, limits = c(0.05, 0.3)) +
  # range is added here to remove outliers 
  annotate("text", x=0.3,y=0.3,size=4,hjust=1,label=paste('r =', round(cor(predicted_lm,observed), 2)))
 
p2 <- ggplot(data=data.frame(observed, predicted_rf), mapping=aes(x=predicted_rf, y=observed)) + 
  geom_point()+ geom_smooth(color='#ab62e3') + theme_minimal()+
  scale_x_continuous(labels=comma, limits = c(0, 0.3)) +
  scale_y_continuous(labels=comma, limits = c(0, 0.3)) +
  # range is added here to remove outliers 
  annotate("text", x=0.3,y=0.3,size=4,hjust=1,label=paste('r =', round(cor(predicted_rf,observed), 2)))

grid.arrange(p1, p2,nrow = 1, top=textGrob("Predicting passenger volume using Linear Regression and Random Forest",gp=gpar(fontsize=15,font=2)),
             bottom=textGrob('Data:Tesco Grocery 1.0.figshare,2020; data.london.gov.uk,2015',gp=gpar(fontsize=9,font=3)))

According to the results above, we can see that both linear regression model and random forest model present smaller errors than “baseline”. In addition, random forest provides a more accurate prediction than linear regression, which can be inferred from higher RSquared value(0.22) and correlation value(0.47).

summary(m_lm)
## 
## Call:
## lm(formula = .outcome ~ ., data = dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.35704 -0.05879 -0.03860  0.02354  0.90762 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        0.057670   0.002643  21.822  < 2e-16 ***
## Amenity_500        0.121005   0.047251   2.561   0.0105 *  
## Business_500       0.070550   0.029011   2.432   0.0151 *  
## Public_service_500 0.199275   0.028447   7.005 2.84e-12 ***
## Education_500      0.257754   0.026164   9.852  < 2e-16 ***
## Healthcare_500     0.163188   0.027686   5.894 4.04e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1225 on 4460 degrees of freedom
## Multiple R-squared:  0.09731,    Adjusted R-squared:  0.0963 
## F-statistic: 96.16 on 5 and 4460 DF,  p-value: < 2.2e-16
varImp(m_rf)
## rf variable importance
## 
##                    Overall
## Public_service_500  100.00
## Education_500        45.46
## Business_500         35.75
## Amenity_500          19.75
## Healthcare_500        0.00

From the coefficients of the linear regression model and the feature importance of random forest, we can make some category-wise conclusions: all the five types of POI are positive to passenger volume; public service and education have more impacts on passenger volume. This conclusion is similar to that of Pearson correlation analysis. The correlations analysis and machine learning suggest a slight influence of POI distribution on the total passenger volumes. However, the model of machine learning is not enough to verify this correlation. Hence, refinements in spatial and temporal dimensions are introduced.

2. Further Spatial Exploration at PA Scale

2.1 Overview of POI and Bus Stop Volume in PA Samples

i. POI Categories Distribution in PA Samples
select_PA <- as.data.frame (c("ANG MO KIO","BEDOK","JURONG WEST","QUEENSTOWN"))
# poi in 4 PA
tmap_mode('plot')
b <- as.list(0)
for (i in 1:nrow(select_PA)) {
  tm_shape(filter(PA, PLN_AREA_N == select_PA[i,])) +
    tm_borders() +
    tm_polygons(alpha=.35,border.alpha =.03)+
    tm_layout(title = select_PA[i,1]) +
    tm_credits("Data: LTA DataMall;OSM")+
  tm_shape(filter(joined_poi,PLN_AREA_N == select_PA[i,])) +
    tm_dots(col = "category", size=0.2) -> b[[i]]
} 

tmap_arrange(b[[1]],b[[2]],b[[3]],b[[4]])

ii. Ratio of POIs within 200-Meter-Radius of Bus Stops
# pre-processing : buffer & poi in 5 PA
c <- as.list(0)
d <- as.list(0)
for (i in 1:nrow(select_PA)) {
  joined_data %>%
    filter(PLN_AREA_N == select_PA[i,]) -> c[[i]]
  st_buffer(c[[i]],100) %>% st_union() -> d[[i]]
}
# chart
e <- as.list(0)
for (i in 1:nrow(select_PA)) {
  tm_shape(filter(PA, PLN_AREA_N == select_PA[i,])) +
    tm_borders(col = "grey") +
  tm_shape(d[[i]]) + 
    tm_polygons(col = "FID", alpha = 0.5, border.col = "white", palette = "Blues", title = "Bus Stop Buffer") +
  tm_shape(filter(joined_poi,PLN_AREA_N == select_PA[i,])) +
    tm_dots(col = "PLN_AREA_N", size = 0.1, palette = "Reds", title = "POI") +
    tm_layout(title = select_PA[i,1],
              legend.text.color = "white") +
    tm_credits("Data: LTA DataMall;OSM")+
  tmap_options(check.and.fix = TRUE) -> e[[i]]
}

tmap_arrange(e[[1]], e[[2]], e[[3]], e[[4]])

kbl(table, caption = "Group Rows") %>%
  kable_paper("striped")
Group Rows
X PA Poi_All Poi_buffer Ratio
1 ANG MO KIO 657 62 0.09
2 BEDOK 521 306 0.59
3 JURONG WEST 570 70 0.12
4 QUEENSTOWN 516 109 0.21
iii. Proportions of POI Categories
f <- as.list(0)
for (i in 1:4) {
joined_poi %>% 
  group_by(PLN_AREA_N,category)  %>% 
  filter(PLN_AREA_N == select_PA[i,1]) %>% 
  mutate(Num=1) %>% 
  ggplot(aes(x="", y=Num, fill=category))+
  geom_bar(width = 1, stat = "identity") +
  coord_polar("y", start=0) +
  labs(x=NULL, y=NULL,
    title=paste("Distributioin of Each POI in", as.character(select_PA[i,1])))+
  scale_fill_brewer(palette="Blues") -> f[[i]]
}

ggarrange(f[[1]], f[[2]], f[[3]], f[[4]], nrow = 2, ncol = 2)

iv. Total Passenger Volumes
joined_data %>% st_drop_geometry() %>%
  filter(PLN_AREA_N %in% c("ANG MO KIO", "BEDOK", "JURONG WEST", "QUEENSTOWN")) %>%
  filter(TAP_IN < 250000) %>%
  group_by(PLN_AREA_N) %>%
  ggplot(mapping = aes(x=TAP_IN, fill=PLN_AREA_N)) + geom_histogram(alpha=0.4) + facet_wrap(~ PLN_AREA_N)+
  labs(x = NULL, y = NULL, 
       title = "Tap in Volume in Each Planning Area ",
       caption = "Data: LTA DataMall;OSM",
       fill="Plan Area") +
  theme_classic()

The above graphs explore the relationships of POI distribution, bus stop distribution ans passenger volumes. The ratio of POIs within 200-meter-radius of bus stops in Bedok is the largest, almost reaching 60%; while that in Ang Mo Kio is the smallest, lower than 10%. The other two districts have medium ratios, at about 10% to 25%. The pie charts showing proportions of POI categories in the four PAs suggest that Bedok has the most business POIs, followed by Queenstown. Both the two districts have comparably higher ratios of POIs near bus stops. It probably suggests the demands of commuters on convenience. From the distribution of total passenger volumes, we could see that each district has one large bus stop with more than 100,000 tap-in people, except Ang Mo Kio, whose ratio is the smallest.Apart from the analysis in the spatial dimension, further exploration is focused on a temporal dimension.

2.2 Temporal Analysis

i. Overview of Tap-in and Tap-out Volumes

The line graph presents an overview of the volumes of bus stops through a day. The two lines of tap-in and tap-out volumes are almost overlapping, suggesting a similarity in both quantities and trends. It also verifies the previous analysis.

# arrival & departure
geobus %>% group_by(TIME_PER_HOUR) %>%
  mutate(sa=sum(TOTAL_TAP_IN_VOLUME),
         sb=sum(TOTAL_TAP_OUT_VOLUME)) %>%
  ggplot() +
  geom_line(aes(x=TIME_PER_HOUR,y=sa, group=1), color="steelblue", size=0.7)+
  geom_point(aes(x=TIME_PER_HOUR,y=sa),color="steelblue", size=2,alpha=0.4)+
  geom_line(aes(x=TIME_PER_HOUR,y=sb, group=1), color="darkorange", size=0.7)+
  geom_point(aes(x=TIME_PER_HOUR,y=sb),color="darkorange", size=2,alpha=0.4, shape=1)+
  labs(x = "Time", y = "Total tap in/tap out volume",
       title = "Total Number of Tap in and Tap out Volume",
       caption="Data: LTA DataMall") +
  theme_classic()

3. Spatial-Temporal Analysis

To explore the relationship in the spatial-temporal dimension, we scale down the research scope into single stops. In this section, we compare departure and arrival movements related to POIs in different categories, applying POI-mobility signatures to representative bus stops of each selected PA sample. Each bus stop is analyzed by a pie chart suggesting the proportions of nearby POIs (within 500 meters), a braided graph comparing departure and arrival volumes, and a radial graph indicating different volumes in weekdays and weekends.

# select bus stop
joined_data %>% 
  filter(PT_CODE %in% c("54391","84009","22009","15131")) -> select_stp
# pre-processing: select bus stop and make buffer
select_stp <- st_transform(select_stp,4326)
st_buffer(select_stp,500) -> select_stp_buffer
# select the poi in buffer
sf::sf_use_s2(FALSE) 
poi %>% mutate(a=0, b=0, c=0,d=0) -> poi_test

for (i in 1:nrow(select_stp_buffer)) {
  st_intersects(poi_test, select_stp_buffer[i,], sparse = FALSE) %>%
    as.numeric() -> poi_test[,i+3]
}

poi_test %>% st_drop_geometry() %>%
  filter(a|b|c|d == 1) %>%
  mutate(bus_buffer = case_when(a==1 ~ "Boon Lay Int",
                           b==1 ~ "Kent Ridge Stn",
                           c==1 ~ "Bedok Int",
                           d==1 ~ "Aft Ang Mo Kio Int")) %>%
  group_by(bus_buffer) %>%
  count(category)  %>%
  group_by(bus_buffer) %>%
  mutate(ratio=n/sum(n)) %>%
  ggplot(mapping = aes(x="", y=ratio, fill=category)) +
  geom_bar(width = 1, stat = "identity") +
  scale_fill_brewer(type = "seq",
  palette = 3,
  direction = 1,
  aesthetics = "fill") +
  facet_wrap(~ bus_buffer) +
  coord_polar("y", start=0) 

g <- as.list(0)
h <- as.list(0)
select_stp %>% st_drop_geometry() %>% as.data.frame() -> select_stp_nongeo
for (i in 1:nrow(select_stp_nongeo)) {
  bus %>%
    filter(PT_CODE == as.character(select_stp_nongeo[i,1])) %>% 
    group_by(TIME_PER_HOUR) %>% 
    mutate(s_in=sum((TOTAL_TAP_IN_VOLUME))) %>% 
    mutate(s_out=sum((TOTAL_TAP_OUT_VOLUME))) -> g[[i]]
    
g[[i]] %>% ggplot(mapping = aes(x=TIME_PER_HOUR)) +
    geom_line(aes(y=s_in, group=1),color="lightblue") + geom_point(aes(y=s_in),color="lightblue", size=2,alpha=0.4)+
    geom_line(aes(y=s_out, group=1),color="lightpink1")+ geom_point(aes(y=s_out),color="lightpink1", size=2,alpha=0.4)+
    geom_braid(aes(x= TIME_PER_HOUR, ymin = s_in, ymax=s_out, fill = s_in > s_out), data = g[[i]], alpha = 0.2) + 
    guides(linetype = "none") +
    labs(x = NULL, y = NULL,
       title = paste("Tap in and Tap out Volume of", as.character(select_stp_nongeo[i,6])),
       fill="tap in volume more than tap out ")+
    theme_minimal() +
    coord_polar()+
    scale_x_continuous(limits = c(0,24),
                        breaks = seq(0, 24, by = 1),
                        minor_breaks = seq(0, 24, by = 1)) -> h[[i]]
}

ggarrange(h[[4]],h[[3]],h[[1]],h[[2]], nrow = 2, ncol = 2)

j <- as.list(0)
k <- as.list(0)
m <- as.list(0)
for (i in 1:nrow(select_stp_nongeo)) {
  bus %>%
    filter(PT_CODE == as.character(select_stp_nongeo[i,1])) %>% 
    group_by (DAY_TYPE,TIME_PER_HOUR) %>% 
    mutate(s_in=sum((TOTAL_TAP_IN_VOLUME))) -> j[[i]]
  
ggplot(data = j[[i]], mapping = aes(x=TIME_PER_HOUR, y=s_in, fill=DAY_TYPE)) +
  geom_col(color="white") +
  theme_minimal()+
  guides(fill = guide_legend(reverse = FALSE)) +
  labs(x = NULL, y = NULL,
       title = paste("Tap in and Tap out Volume of", as.character(select_stp_nongeo[i,6])),
       fill = "Weekdays / Weekends and Holidays")+ 
  coord_polar()+
  scale_x_continuous(limits = c(0,24),
                     breaks = seq(0, 24, by = 1),
                     minor_breaks = seq(0, 24, by = 1)) -> m[[i]]

}

ggarrange(m[[4]],m[[3]],m[[1]],m[[2]], nrow = 2, ncol = 2)

i. ANG MO KIO: Aft Ang Mo Kio Int (54391) - Public Services/Healthcare

Almost half of the POIs around this bus stop are public services, and business occupies the smallest proportion among the four bus stop.From the braided graph, we see people leave this bus stop in the day and arrive at midnight.The peaks in weekdays appear in the noon and towards evening. An interesting point is that people arrive at this bus stop mostly at midnight.

ii. BEDOK: Bedok Int (84009) - Business/Public Services

Business is the dominating type of POIs in Bedok Int bus stop. From the braided graph, we could see that people drop off at this stop in the morning and depart in the afternoon. Peaks of weekdays are in commuting hours, while weekends have no peaks.The activities here happen mostly in the day, featuring commuting.

iii. JURONG WEST: Boon Lay Int (22009) - Public Service/Amenity

Public services are the dominating POIs around Jurong West bus stop, followed by amenity occupying less than 40% of the total. We see as well that people drop off at this stop in the morning and depart in the afternoon. Peaks on weekdays are in commuting hours, with small volumes in the rest of time.

iv. QUEENSTOWN: Kent Ridge Stn (138647) - Amenity/Business

The two major types of POIs are Amenity and Business, with almost the same proportions. Through the whole day people continuously arrive at this bus stop. Peaks on weekdays appear in commuting hours as well, but the traffic is particularly heavy in the evening. Besides, there are few passenger volumes on weekends.

v. Further Comparison and Explanation

Although peaks in commuting hours are general patterns, it is more pronounced in bus stops featuring business. There are more obvious differences between patterns in weekdays and weekends as well. In addition, these bus stops tend to aggregate people in the morning, and distribute them in the evening. In other bus stops featuring public services, the patterns are more difficult to explain. For example, the tap-in volumes in Kent Ridge Stn are always larger than tap-out volumes, and the situation is opposite in Aft Ang Mo Kio Int. They are probably affected by other factors. For example, the volumes brought about by health care and public services may be very small. While we consider them as dominating POIs in terms of numbers, they may not be an important factor of volumes.

Limitation

Despite the analyses presented above, limitations of this report are as follows. First, data is not sufficient. The information concerning POI data is limited to categories and locations. Second, unlike more flexible vehicles like taxis, locations of bus stops are decided by the transportation planning, thus may not accurately reflect the travel willingness of passengers.

Conclusion

In this report, through the processing and visualization of OD and POI data, we have explored the relationships between bus stops and POIs in spatial and temporal dimensions. Despite the fact that POI distribution has little effect on the total passenger volumes, we could see its influence on the volumes distributed across space and through time. The distribution of different categories of POIs guide people on and off the bus in corresponding bus stops and during certain time periods. Therefore, we can conclude that the bus stop volumes are related to POI distribution, indicating the correlation between movements and motivations. This conclusion suggests the importance of POI planning to evenly distribute the traffic among the city. Moreover, further investigation could be involved to discuss the findings in a more detailed way.

Appendix

Distance Distribution of Nearest POIs to Bus Stops

geostp_nogeo <- st_drop_geometry(geostp)

for (i in 1:nrow(geostp)) {
  a = poi[st_nearest_feature(geostp[i,], poi),]
  b = st_distance(geostp[i,],a) 
  as.numeric(b) -> geostp_nogeo$dist[i]
  a$interest -> geostp_nogeo$interest[i]
  a$category -> geostp_nogeo$category[i]
}
geostp_nogeo %>%
  group_by(category) %>%
  mutate(m=mean(dist)) %>%
  ggplot(mapping = aes(x=reorder(category,-m),y=m)) +
  geom_line(aes(group=1), color="lightblue", size=1.2, linetype=3)+
  geom_point(size=2)+
  theme(panel.grid.major.x = element_blank(),panel.grid.minor.x = element_blank())+
  labs(x = NULL, y = NULL, 
       title = "Distance Distribution of Nearest POIs to Bus Stops/Category",
       caption = "Data: LTA DataMall;OSM") 

geostp_nogeo %>% 
  mutate(m=mean(dist)) %>%
  ggplot(mapping = aes(x=reorder(interest,-m),y=dist)) +
  geom_violin(draw_quantiles = c(0.5),colour = "lightblue3", fill="lightgrey",alpha=0.1) +
  coord_flip() +
  labs(x = NULL, y = NULL,
       title = "Distance Distribution of Nearest POIs to Bus Stops/Number",
       caption = "Data: LTA DataMall;OSM") +
  theme(
      panel.grid.major.x = element_blank(),
      panel.grid.minor.x = element_blank()) 

Reference


  1. Zeng, W., Fu, Chi-Wing, Arisona, S. Müller, Schubiger, S., Burkhard, R. and Ma, K.-L., “Visualizing the Relationship Between Human Mobility and Points of Interest.” IEEE Transactions on Intelligent Transportation Systems, 18, no. 8, (2017): 2271-2284. https://doi.og/10.1109/TITS.2016.2639320.↩︎

  2. Guo, Diansheng, Xi Zhu, Hai Jin, Peng Gao, and Clio Andris. “Discovering Spatial Patterns in Origin-Destination Mobility Data.” Transactions in GIS 16, no. 3 (2012): 411–29. https://doi.org/10.1111/j.1467-9671.2012.01344.x.↩︎

  3. Wood, Jo, Jason Dykes, and Aidan Slingsby. “Visualization of Origins, Destinations and Flows with OD Maps.” Landmarks in Mapping, 2017, 343–62. https://doi.org/10.4324/9781351191234-30.↩︎

  4. Boyandin, Ilya, Enrico Bertini, Peter Bak, and Denis Lalanne. “Flowstrates: An Approach for Visual Exploration of Temporal Origin-Destination Data.” Computer Graphics Forum 30, no. 3 (2011): 971–80. https://doi.org/10.1111/j.1467-8659.2011.01946.x.↩︎

  5. Zeng, W., C.-W. Fu, S. Müller Arisona, A. Erath, and H. Qu. “Visualizing Waypoints-Constrained Origin-Destination Patterns for Massive Transportation Data.” Computer Graphics Forum 35, no. 8 (2015): 95–107. https://doi.org/10.1111/cgf.12778.↩︎

  6. Ibid.↩︎

  7. Yu, Changbin, Fu Ren, Qingyun Du, Zhiyuan Zhao, and Ke Nie. “Web Map-Based Poi Visualization for Spatial Decision Support.” Cartography and Geographic Information Science 40, no. 3 (2013): 172–82. https://doi.org/10.1080/15230406.2013.807030.↩︎

  8. Jiang, Shan, Joseph Ferreira, and Marta C. Gonzalez. “Activity-Based Human Mobility Patterns Inferred from Mobile Phone Data: A Case Study of Singapore.” IEEE Transactions on Big Data 3, no. 2 (2017): 208–19. https://doi.org/10.1109/tbdata.2016.2631141.↩︎

  9. T. Carlstein, D. Parkes and N. Thrift, Human Activity and Time Geography, Berkeley, CA, USA:Univ. California, 1978.↩︎

  10. Eagle, Nathan, Alex (Sandy) Pentland, and David Lazer. “Inferring Friendship Network Structure by Using Mobile Phone Data.” Proceedings of the National Academy of Sciences 106, no. 36 (2009): 15274–78. https://doi.org/10.1073/pnas.0900282106.↩︎

  11. F. Calabrese, M. Colonna, P. Lovisolo, D. Parata and C. Ratti, “Real-time urban monitoring using cell phones: A case study in Rome”, IEEE Trans. Intell. Transp. Syst., vol. 12, no. 1, pp. 141-151, Mar. 2011.↩︎

  12. Caceres, N., J.P. Wideberg, and F.G. Benitez. “Deriving Origin–Destination Data from a Mobile Phone Network.” IET Intelligent Transport Systems 1, no. 1 (2007): 15. https://doi.org/10.1049/iet-its:20060020.↩︎

  13. “Proceedings of the 2011 International Workshop on Trajectory Data Mining and Analysis.” ACM Conferences. Accessed April 10, 2022. https://dl.acm.org/doi/proceedings/10.1145/2030080.↩︎

  14. Zeng, W., C.-W. Fu, S. Müller Arisona, A. Erath, and H. Qu. “Visualizing Waypoints-Constrained Origin-Destination Patterns for Massive Transportation Data.” Computer Graphics Forum 35, no. 8 (2015): 95–107. https://doi.org/10.1111/cgf.12778.↩︎

  15. Ibid.↩︎